AITopics | chemical information and modeling

Collaborating Authors

chemical information and modeling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Virtual Screening under Structural Uncertainty via Alignment and Aggregation

Neural Information Processing SystemsJun-22-2026, 14:02:41 GMT

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods--whether physics-based or deep learning-based--are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing firstin-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at https: //github.com/Wiley-Z/AANet.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

725215ed82ab6306919b485b81ff9615-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 08:44:50 GMT

conformer, molecule, torsion angle, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > Wales (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

39781da4b5d05bc2908ce08e43bc6404-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 09:24:53 GMT

compound, library, synthon, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Schneckenreiter, Lisa, Luukkonen, Sohvi, Friedrich, Lukas, Kuhn, Daniel, Klambauer, Günter

arXiv.org Machine LearningJan-15-2026

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for pre-defined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves state-of-the-art zero-shot virtual screening performance in settings where no binding pocket information is provided as input, substantially outperforms existing methods on a challenging target fishing task, and demonstrates competitive ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

machine learning, natural language, prediction, (18 more...)

arXiv.org Machine Learning

2601.09693

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space

Park, Jun-Hyoung, Song, Ho-Jun, Lee, Seong-Whan

arXiv.org Artificial IntelligenceNov-19-2025

Deep learning-based molecular generation models have shown great potential in efficiently exploring vast chemical spaces by generating potential drug candidates with desired properties. However, these models often produce chemically invalid molecules, which limits the usable scope of the learned chemical space and poses significant challenges for practical applications. To address this issue, we propose ChemFixer, a framework designed to correct invalid molecules into valid ones. ChemFixer is built on a transformer architecture, pre-trained using masking techniques, and fine-tuned on a large-scale dataset of valid/invalid molecular pairs that we constructed. Through comprehensive evaluations across diverse generative models, ChemFixer improved molecular validity while effectively preserving the chemical and biological distributional properties of the original outputs. This indicates that ChemFixer can recover molecules that could not be previously generated, thereby expanding the diversity of potential drug candidates. Furthermore, ChemFixer was effectively applied to a drug-target interaction (DTI) prediction task using limited data, improving the validity of generated ligands and discovering promising ligand-protein pairs. These results suggest that ChemFixer is not only effective in data-limited scenarios, but also extensible to a wide range of downstream tasks. Taken together, ChemFixer shows promise as a practical tool for various stages of deep learning-based drug discovery, enhancing molecular validity and expanding accessible chemical space.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JBHI.2025.3593825

2511.13758

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Assay2Mol: large language model-based drug design using BioAssay context

Deng, Yifan, Ericksen, Spencer S., Gitter, Anthony

arXiv.org Artificial IntelligenceNov-18-2025

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.emnlp-main.1390

2507.12574

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.92)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

Cremer, Julian, Le, Tuan, Ghahremanpour, Mohammad M., Sługocka, Emilia, Menezes, Filipe, Clevert, Djork-Arné

arXiv.org Artificial IntelligenceNov-7-2025

We present FLOWR:root, an equivariant flow-matching model for pocket-aware 3D ligand generation with joint binding affinity prediction and confidence estimation. The model supports de novo generation, pharmacophore-conditional sampling, fragment elaboration, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, followed by refinement on curated co-crystal datasets and parameter-efficient finetuning for project-specific adaptation. FLOWR:root achieves state-of-the-art performance in unconditional 3D molecule generation and pocket-conditional ligand design, producing geometrically realistic, low-strain structures. The integrated affinity prediction module demonstrates superior accuracy on the SPINDR test set and outperforms recent models on the Schrodinger FEP+/OpenFE benchmark with substantial speed advantages. As a foundation model, FLOWR:root requires finetuning on project-specific datasets to account for unseen structure-activity landscapes, yielding strong correlation with experimental data. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering molecular design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies, while ER$α$, TYK2 and BACE1 scaffold elaboration demonstrates strong agreement with QM calculations. By integrating structure-aware generation, affinity estimation, and property-guided sampling, FLOWR:root provides a comprehensive foundation for structure-based drug design spanning hit identification through lead optimization.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.02578

Country:

Europe > Germany (0.46)
Europe > Italy (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Data Science (0.67)

Add feedback

AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

Zhu, Wenyu, Wang, Jianhui, Gao, Bowen, Jia, Yinjun, Tan, Haichuan, Zhang, Ya-Qin, Ma, Wei-Ying, Lan, Yanyan

arXiv.org Artificial IntelligenceOct-31-2025

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods--whether physics-based or deep learning-based--are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at https://github.com/Wiley-Z/AANet.

artificial intelligence, cavity, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.05768

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Re-evaluating sample efficiency in de novo molecule generation

Thomas, Morgan, O'Boyle, Noel M., Bender, Andreas, De Graaf, Chris

arXiv.org Artificial IntelligenceOct-30-2025

De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particular disadvantage when combining deep generative models with computationally expensive molecule scoring functions (a.k.a. oracles) commonly used in computer-aided drug design. Recent works have therefore focused on methods to improve sample efficiency in the context of de novo molecule drug design, or to benchmark it. In this work, we discuss and adapt a recent sample efficiency benchmark to better reflect realistic goals also with respect to the quality of chemistry generated, which must always be considered in the context of small-molecule drug design; we then re-evaluate all benchmarked generative models. We find that accounting for molecular weight and LogP with respect to the training data, and the diversity of chemistry proposed, re-orders the ranking of generative models. In addition, we benchmark a recently proposed method to improve sample efficiency (Augmented Hill-Climb) and found it ranked top when considering both the sample efficiency and chemistry of molecules generated. Continual improvements in sample efficiency and chemical desirability enable more routine integration of computationally expensive scoring functions on a more realistic timescale.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.01385

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Data-driven approach to the design of complexing agents for trivalent transuranium elements

Karpov, Kirill V., Pikulin, Ivan S., Bokov, Grigory V., Mitrofanov, Artem A.

arXiv.org Artificial IntelligenceSep-29-2025

The properties of complexes with transuranium elements have long been the object of research in various fields of chemistry. However, their experimental study is complicated by their rarity, high cost and special conditions necessary for working with such elements, and the complexity of quantum chemical calculations does not allow their use for large systems. To overcome these problems, we used modern machine learning methods to create a novel neural network architecture that allows to use available experimental data on a number of elements and thus significantly improve the quality of the resulting models. We also described the applicability domain of the presented model and identified the molecular fragments that most influence the stability of the complexes.

artificial intelligence, machine learning, stability constant, (17 more...)

arXiv.org Artificial Intelligence

2509.21362

Country:

North America > United States (0.28)
Europe > Netherlands (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Power Industry (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback